Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning
نویسنده
چکیده
Availability of high performance computing infrastructures such as clusters of GPUs and CPUs have fueled the growth of distributed learning systems. Deep Learning frameworks express neural nets as DAGs and execute these DAGs on computation resources such as GPUs. In this paper, we propose efficient designs of embedding MPI collective operations into data parallel DAGs. Incorrect designs can easily lead to deadlocks or program crashes. In particular, we demonstrate three designs: Funneled, Concurrent communication and Dependency chaining of using MPI collectives with DAGs. These designs automatically enable overlap of computation with communication by allowing for concurrent execution with the other tasks. We directly implement these designs into the KVStore API of the MXNET. This allows us to directly leverage the rest of the infrastructure. Using ImageNet and CIFAR data sets, we show the potential of our designs. In particular, our designs scale to 256 GPUs with as low as 50 seconds of epoch times for ImageNet 1K datasets.
منابع مشابه
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Existing Deep Learning frameworks exclusively use either Parameter Server(PS) approach or MPI parallelism. In this paper, we discuss the drawbacks of such approaches and propose a generic framework supporting both PS and MPI programming paradigms, co-existing at the same time. The key advantage of the new model is to embed the scaling benefits of MPI parallelism into the loosely coupled PS task...
متن کاملMXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mob...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملPig Identification Based on MXNet
We tried the challenge of pig identification on MXNet, which is the new Amazon’s deep learning framework. By using the feature-based transfer learning method with fine-tuning, the features of the pre-trained Convolutional Neural Network(CNN) model were stitched together with the random weight output layer, and then input Target data and update the new fine-tuning model with a lower learning rat...
متن کاملOpen MPI for Cray XE/XK Systems
Open MPI provides an implementation of the MPI standard supporting communication over a range of highperformance network interfaces. Recently, Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL) collaborated on creating a port of Open MPI for Gemini, the network interface for Cray XE and XK systems. In this paper, we present our design and implementation of Open MPI’s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.06949 شماره
صفحات -
تاریخ انتشار 2018